Communication-Avoiding Parallel Algorithms for Solving Triangular Matrix Equations
نویسنده
چکیده
In this work an algorithm for solving triangular systems of equations for multiple right hand sides is presented. The algorithm for solving triangular systems for multiple right hand sides, commonly referred to as the TRSM problem, is a very important in dense linear algebra as it is a subroutine for most decompositions of matrices as LU or QR. To improve performance over the standard iterative algorithms for TRSM, a block wise inversion paired with triangular matrix multiplications is used. To perform the inversion, the lower triangular form of the matrix is exploited and a recursive scheme is applied to further decrease communication cost. With that, the latency of the algorithm decreases while the bandwidth and floating point operations count stay asymptotically the same. Concretely, a decrease of latency with a factor of p2/3/ log p was achieved for a significant range of relative matrix sizes when working with p processors. The proposed method is implemented and its performance is benchmarked against the widely used ScaLAPACK [1] library. The results show promising tendencies for the inversion, with a maximal speedup of 1.7 over ScaLAPACK for 4096 processors. Due to the inferior performance of triangular matrix multiplications with respect to the triangular solve, no overall improvement is made yet.
منابع مشابه
Parallel Algorithms and Condition Estimators for Standard and Generalized Triangular Sylvester-Type Matrix Equations
We discuss parallel algorithms for solving eight common standard and generalized triangular Sylvester-type matrix equation. Our parallel algorithms are based on explicit blocking, 2D block-cyclic data distribution of the matrices and wavefront-like traversal of the right hand side matrices while solving small-sized matrix equations at different nodes and updating the rest of the right hand side...
متن کاملOptimal Dag Partitioning for Partially Inverting Triangular Systems
An approach for solving sparse triangular systems of equations on highly parallel computers employs a partitioned representation of the inverse of the triangular matrix so that the solution can be obtained by a series of matrix-vector multiplications. This approach requires a number of global communication steps that is proportional to the number of factors in the partitioning. The problem of n...
متن کاملComputational method based on triangular operational matrices for solving nonlinear stochastic differential equations
In this article, a new numerical method based on triangular functions for solving nonlinear stochastic differential equations is presented. For this, the stochastic operational matrix of triangular functions for It^{o} integral are determined. Computation of presented method is very simple and attractive. In addition, convergence analysis and numerical examples that illustrate accuracy and eff...
متن کاملA high performance two dimensional scalable parallel algorithm for solving sparse triangular systems
Solving a system of equations of the form Tx = y, where T is a sparse triangular matrix, is required after the factorization phase in the direct methods of solving systems of linear equations. A few parallel formulations have been proposed recently. The common belief in parallelizing this problem is that the parallel formulation utilizing a two dimensional distribution of T is unscalable. In th...
متن کاملScalable Parallel Algorithms for Solving Sparse Systems of Linear Equations∗
We have developed a highly parallel sparse Cholesky factorization algorithm that substantially improves the state of the art in parallel direct solution of sparse linear systems—both in terms of scalability and overall performance. It is a well known fact that dense matrix factorization scales well and can be implemented efficiently on parallel computers. However, it had been a challenge to dev...
متن کامل